Clean data for Abundance - size spectra and beavers

Author

Leonardo Capitani

Published

August 24, 2025

Modified

August 26, 2025

Load packages

Code
library(tidyverse)
library(here)
library(knitr)
library(ggdist)

Load raw data:

We are going to load the raw data collected by Ph.D. student Valentin Moser and master sudents / research assistants.

In the folder data/raw you will find data_arthropods_flying.xlsx

This file contains data collected by flying traps like this one:

Dr. Cornelia Twining deploying the flying arthropods trap into Swiss stream. Photo retriaved from here.
Code
d <- readxl::read_xlsx(path = here("data", "raw","data_arthropods_flying.xlsx"), sheet = 1) |> 
select(-c(sort, remarks)) # remove unnecesary column
glimpse(d)
Rows: 6,681
Columns: 14
$ site        <chr> "Chrie", "Chrie", "Chrie", "Chrie", "Chrie", "Chrie", "Chr…
$ location    <chr> "Control", "Control", "Control", "Control", "Control", "Co…
$ date        <chr> "June", "June", "June", "June", "June", "June", "June", "J…
$ class       <chr> "Arachnida", "Arachnida", "Arachnida", "Arachnida", "Insec…
$ order       <chr> "Araneae", "Araneae", "Araneae", "Araneae", "Coleoptera", …
$ suborder    <chr> NA, NA, NA, NA, "Adephaga", "Phytophaga", "Phytophaga", "P…
$ family      <chr> NA, NA, NA, NA, "Dytiscidae", "Curculionidae", "Curculioni…
$ subfamily   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ juvenile    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ terrestrial <dbl> 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ juv.aquatic <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ unusable    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ size        <dbl> 2, 2, 2, 2, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ Laufnummer  <dbl> 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137…

Metadatada:

Code
data_dict <- data.frame(
  Column_name = c(
    "site", "location", "date", "class", "order", "suborder", "family",
    "subfamily", "juvenile", "terrestrial", "juv aquatic", "unusable",
    "size", "remarks"
  ),
  Explanation = c(
    "Study system in which the individual was sampled",
    "Site at which individual was sampled (i.e. pool or control)",
    "Collection period during which the individual was sampled (i.e. June or July)",
    "Taxonomic class the sampled individual belongs to",
    "Taxonomic order the sampled individual belongs to",
    "Taxonomic suborder the sampled individual belongs to (where available)",
    "Taxonomic family the sampled individual belongs to (where available)",
    "Taxonomic subfamily the sampled individual belongs to (where available)",
    "Binary value indicating if the sampled individual was a juvenile. 1 = juvenile, 0 = adult",
    "Binary value indicating if the sampled individual was winged. 1 = non-winged, 0 = winged",
    "Binary value indicating if the sampled individual belongs to a taxa with purely aquatic juveniles. 1 = aquatic, 0 = not aquatic",
    "Binary value indicating if the sampled individual came from water (amphipoda, gastropoda)",
    "Length of the sampled individual in mm (rounded to full numbers). Measured from head to end of abdomen, excluding appendages (wings, limbs, antennae, cerci, etc.)",
    "Additional comments or information about a given individual"
  )
)

kable(data_dict, align = c("l","l"), caption = "Metadata for data_arthropods_flying.xlsx")
Metadata for data_arthropods_flying.xlsx
Column_name Explanation
site Study system in which the individual was sampled
location Site at which individual was sampled (i.e. pool or control)
date Collection period during which the individual was sampled (i.e. June or July)
class Taxonomic class the sampled individual belongs to
order Taxonomic order the sampled individual belongs to
suborder Taxonomic suborder the sampled individual belongs to (where available)
family Taxonomic family the sampled individual belongs to (where available)
subfamily Taxonomic subfamily the sampled individual belongs to (where available)
juvenile Binary value indicating if the sampled individual was a juvenile. 1 = juvenile, 0 = adult
terrestrial Binary value indicating if the sampled individual was winged. 1 = non-winged, 0 = winged
juv aquatic Binary value indicating if the sampled individual belongs to a taxa with purely aquatic juveniles. 1 = aquatic, 0 = not aquatic
unusable Binary value indicating if the sampled individual came from water (amphipoda, gastropoda)
size Length of the sampled individual in mm (rounded to full numbers). Measured from head to end of abdomen, excluding appendages (wings, limbs, antennae, cerci, etc.)
remarks Additional comments or information about a given individual
Note

Question:

How many taxa which ones are there?

Code
# Get unique families (excluding NA)
families <- unique(na.omit(d$family))

# Number of unique families
num_families <- length(families)

# Print results
cat("Number of unique families:", num_families, "\n\n")
Number of unique families: 29 
Code
cat("Families:\n")
Families:
Code
print(sort(families))
 [1] "Aphidoidea"    "Cantharidae"   "Carabidae"     "Chrysomelidae"
 [5] "Coccinellidae" "Cucujidae"     "Curculionidae" "Dytiscidae"   
 [9] "Elmidae"       "Erebidae"      "Forficulidae"  "Formicidae"   
[13] "Gerridae"      "Gyrinidae"     "Haliplidae"    "Hydrophilidae"
[17] "Latridiidae"   "Monotomidae"   "Mordellidae"   "Nitidulidae"  
[21] "Notonectidae"  "Panorpidae"    "Phalacridae"   "Psylloidea"   
[25] "Scirtidae"     "Staphylinidae" "Syrphidae"     "Tabanidae"    
[29] "Vespidae"     

Description of the families we have found:

Code
# Families Found Near Streams in Switzerland

families_table <- data.frame(
  Family = c(
    "Aphidoidea", "Cantharidae", "Carabidae", "Chrysomelidae",
    "Coccinellidae", "Cucujidae", "Curculionidae", "Dytiscidae",
    "Elmidae", "Erebidae", "Forficulidae", "Formicidae",
    "Gerridae", "Gyrinidae", "Haliplidae", "Hydrophilidae",
    "Latridiidae", "Monotomidae", "Mordellidae", "Nitidulidae",
    "Notonectidae", "Panorpidae", "Phalacridae", "Psylloidea",
    "Scirtidae", "Staphylinidae", "Syrphidae", "Tabanidae",
    "Vespidae"
  ),
  Description = c(
    "Aphids; plant sap-feeders, often found on riparian vegetation.",
    "Soldier beetles; predatory or nectar-feeding, common in meadows near water.",
    "Ground beetles; many species are predators along stream banks.",
    "Leaf beetles; herbivores on riparian plants.",
    "Lady beetles; mostly aphid predators on vegetation.",
    "Flat bark beetles; live under bark, sometimes in moist riparian wood.",
    "Weevils; herbivores feeding on riparian plants and shrubs.",
    "Predaceous diving beetles; aquatic predators in streams and ponds.",
    "Riffle beetles; aquatic, live attached to stones in running water.",
    "Tiger moths and relatives; larvae feed on diverse plants near water.",
    "Earwigs; omnivores hiding under stones and wood along streams.",
    "Ants; common in soils and vegetation along riparian zones.",
    "Water striders; aquatic predators skating on water surfaces.",
    "Whirligig beetles; fast swimmers on water surfaces in streams.",
    "Crawling water beetles; small herbivorous beetles in shallow water.",
    "Water scavenger beetles; aquatic or semi-aquatic scavengers.",
    "Minute brown scavenger beetles; found in decaying plant matter.",
    "Root-eating beetles; often associated with decaying wood.",
    "Tumbling flower beetles; found on flowers near riparian habitats.",
    "Sap beetles; feed on decaying fruit, fungi, and plant material.",
    "Backswimmers; aquatic predators that swim upside down.",
    "Scorpionflies; scavengers, often in damp shaded stream habitats.",
    "Shining flower beetles; small pollen feeders.",
    "Psyllids; plant sap-feeders, often on riparian trees and shrubs.",
    "Marsh beetles; aquatic or semi-aquatic beetles in wetlands.",
    "Rove beetles; very diverse predators and scavengers in moist habitats.",
    "Hoverflies; larvae are aphid predators, adults visit flowers.",
    "Horse flies; adults feed on blood or nectar, larvae in wet soils.",
    "Wasps; diverse group of predators and parasitoids near water."
  )
)

kable(families_table, caption = "Ecological roles of arthropod families sampled close to streams in Switzerland")
Ecological roles of arthropod families sampled close to streams in Switzerland
Family Description
Aphidoidea Aphids; plant sap-feeders, often found on riparian vegetation.
Cantharidae Soldier beetles; predatory or nectar-feeding, common in meadows near water.
Carabidae Ground beetles; many species are predators along stream banks.
Chrysomelidae Leaf beetles; herbivores on riparian plants.
Coccinellidae Lady beetles; mostly aphid predators on vegetation.
Cucujidae Flat bark beetles; live under bark, sometimes in moist riparian wood.
Curculionidae Weevils; herbivores feeding on riparian plants and shrubs.
Dytiscidae Predaceous diving beetles; aquatic predators in streams and ponds.
Elmidae Riffle beetles; aquatic, live attached to stones in running water.
Erebidae Tiger moths and relatives; larvae feed on diverse plants near water.
Forficulidae Earwigs; omnivores hiding under stones and wood along streams.
Formicidae Ants; common in soils and vegetation along riparian zones.
Gerridae Water striders; aquatic predators skating on water surfaces.
Gyrinidae Whirligig beetles; fast swimmers on water surfaces in streams.
Haliplidae Crawling water beetles; small herbivorous beetles in shallow water.
Hydrophilidae Water scavenger beetles; aquatic or semi-aquatic scavengers.
Latridiidae Minute brown scavenger beetles; found in decaying plant matter.
Monotomidae Root-eating beetles; often associated with decaying wood.
Mordellidae Tumbling flower beetles; found on flowers near riparian habitats.
Nitidulidae Sap beetles; feed on decaying fruit, fungi, and plant material.
Notonectidae Backswimmers; aquatic predators that swim upside down.
Panorpidae Scorpionflies; scavengers, often in damp shaded stream habitats.
Phalacridae Shining flower beetles; small pollen feeders.
Psylloidea Psyllids; plant sap-feeders, often on riparian trees and shrubs.
Scirtidae Marsh beetles; aquatic or semi-aquatic beetles in wetlands.
Staphylinidae Rove beetles; very diverse predators and scavengers in moist habitats.
Syrphidae Hoverflies; larvae are aphid predators, adults visit flowers.
Tabanidae Horse flies; adults feed on blood or nectar, larvae in wet soils.
Vespidae Wasps; diverse group of predators and parasitoids near water.

Plot data

Code
# Alternative: side-by-side histogram (use 'dodge')
ggplot(d, aes(x = size, fill = location)) +
  geom_histogram(binwidth = 1, color = "black", alpha = 0.7, position = "dodge") +
  labs(
    title = "Histogram of individual sizes by sampled location",
    x = "Size (mm)",
    y = "Frequency",
    fill = "Location"
  ) +
  theme_bw(base_size = 14)

Code
ggplot(d, aes(y = location, x = size, fill = location)) +
  stat_halfeye(position = "dodge",
    adjust = 1,       # smoothness of density
    width = 0.6,        # width of half-eye
    justification = -0.1,
    point_interval = mean_qi,  # show mean & 95% interval
    alpha = 0.7
  ) +
  labs(
    title = "Size Distributions by Location",
    y = "Location",
    x = "Size (mm)"
  ) +
  theme_bw(base_size = 14)

References